{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ba1db53d-7034-4939-a348-00010503a791",
   "metadata": {},
   "source": [
    "# Lesson 12: Visual Question & Answering"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8940c911",
   "metadata": {},
   "source": [
    "- In the classroom, the libraries are already installed for you.\n",
    "- If you would like to run this code on your own machine, you can install the following:\n",
    "\n",
    "```\n",
    "    !pip install transformers\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea7c87fb-c19c-415f-9601-23b3df1407f1",
   "metadata": {},
   "source": [
    "- Here is some code that suppresses warning messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "408b6348-302b-4305-9832-17c627b4dfad",
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "from transformers.utils import logging\n",
    "logging.set_verbosity_error()\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\", message=\"Using the model-agnostic default `max_length`\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed8d41c7-3a0d-4205-b47b-9a1e6215bc1e",
   "metadata": {},
   "source": [
    "* Load the Model and the Processor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "504c2a05-19bf-4785-a1a0-72d282d7055b",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from transformers import BlipForQuestionAnswering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31ee7551-f403-402e-96a2-1e2b4458b8ca",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "model = BlipForQuestionAnswering.from_pretrained(\n",
    "    \"./models/Salesforce/blip-vqa-base\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b673bd3",
   "metadata": {},
   "source": [
    "Info about [Salesforce/blip-vqa-base](https://huggingface.co/Salesforce/blip-vqa-base)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "117bb100-a107-419e-a43a-be34525a4ca8",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from transformers import AutoProcessor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "806431d5-334c-4d17-90a7-2c9ac763e64f",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "processor = AutoProcessor.from_pretrained(\n",
    "    \"./models/Salesforce/blip-vqa-base\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c130ee3-e6af-4f40-8cf8-00beb908427b",
   "metadata": {},
   "source": [
    "- Load the image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "532a6ce6-34f3-4216-82dc-61adcf237a58",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "from PIL import Image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2cb1551d-9962-4692-94a6-1611a51495ab",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "image = Image.open(\"./beach.jpeg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "165a2d53-e80d-4df4-9cc1-e6b685978a18",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5224486-c9e8-4d9a-a8f3-779f03bc652e",
   "metadata": {},
   "source": [
    "- Write the `question` you want to ask to the model about the image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73c0677c-44a8-4cc2-b2e5-b9f84c8451c3",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "question = \"how many dogs are in the picture?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "71480588-f901-47aa-8212-34bda635e38b",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "inputs = processor(image, question, return_tensors=\"pt\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bca3645c-fa84-48ef-8fc0-938b6894e5b0",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "out = model.generate(**inputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7d2bfcf-e9c7-4b4e-9f54-5ea0526d7d41",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "print(processor.decode(out[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6550904f",
   "metadata": {},
   "source": [
    "### Try it yourself! \n",
    "- Try this model with your own images and questions!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
